Dynamic patterns of microRNA expression during acute myeloid leukemia state-transition

MicroRNAs (miRNAs) have been shown to hold prognostic value in acute myeloid leukemia (AML); however, the temporal dynamics of miRNA expression in AML are poorly understood. Using serial samples from a mouse model of AML to generate time-series miRNA sequencing data, we are the first to show that the miRNA transcriptome undergoes state-transition during AML initiation and progression. We modeled AML state-transition as a particle undergoing Brownian motion in a quasi-potential and validated the AML state-space and state-transition model to accurately predict time to AML in an independent cohort of mice. The critical points of the model provided a framework to align samples from mice that developed AML at different rates. Our mathematical approach allowed discovery of dynamic processes involved during AML development and, if translated to humans, has the potential to predict an individual’s disease trajectory.

counts were determined using R scripts and miRbase v21 (36). Log normalized counts were again generated from CPM [i.e., log2(CPM+0.01)]; one sample (out of 99 total samples) was removed as an outlier based on poor library quality and abnormal expression patterns.

Identification of the miRNA state-space and correlation with Kit expression
In order to identify which principal component was most associated with AML state-transition, we examined all principal components of the data matrix composed of all time-series miRNA expression data from CM and control samples and correlated them with expression of Kit gene, which is an immunophenotypic marker of AML. We observed that PC1 had both the highest R 2 correlation with Kit expression as well as the lowest p-value of all PCs ( Figure S3, Table S1). PC1 and PC2 accounted for 5 and 4%, respectively, of the total variance present in the data. Kit expression was determined using the matched mRNA sequencing (RNA-seq) for each sample as previously described (13). The percentage of cells that expressed the cKit protein (% cKit+) was determined in each sample using flow cytometry (Table S10). cKit+ was compared to PC1 and time ( Figure S1A-B) and was shown to be similar to Kit mRNA expression ( Figure S1C). To confirm that the miRNA state-space loadings ( 1 * ) were associated with AML, we calculated the correlation between each miRNA expression and % cKit+ for each sample ( Figure S1D; Table  S11). This showed that the miRNA with the largest negative loadings values (positive contribution to AML) were also the most correlated with % cKit+. An unpaired t-test was performed to compare the PC1 components of the CM and control mice at each time point and showed a significant difference between PC1 and % cKit+ cells for timepoints t=1,2,3, and 8 (Table S2).
Other dimensionality reduction methods were investigated to construct the AML state-space and were compared to the SVD-derived state-space ( Figure S11). We observed that the space constructed with diffusion mapping was most similar to the one created with the SVD. This is expected as diffusion mapping uses PCA prior to the application of the diffusion kernel. Nonlinear methods including t-SNE and UMAP did not result in clear separation between the control and CM samples. As a result, the SVD-based state-space was used for the final analysis. The advantage of the SVD-derived state-space is that SVD has no free parameters and maximally preserves the information in the data. All dimensionality reduction algorithms were run with default parameters using the R packages umap v0.2.7.0, Rtsne v0.15, and destiny v3.4.0 for UMAP, t-SNE, and diffusion map respectively.

Pathway analysis of miRNA gene targets
The miRNAs were mapped to experimentally validated gene targets using miRTarBase v8.0 (12). By combining the gene targets with Gene Ontology (GO), KEGG, and WikiPathway databases, pathways and GO terms were converted to their targeting miRNA based on the approach of Godard et al. 2015 (13)(14)(15)(16). A significant limitation of this approach was that there are currently very few experimentally validated miRNA-gene targets for mice. We therefore limited our investigation of the biological role of our findings in order to avoid over interpretation of the pathway and miRNA-gene target results.

Investigating the AML state-space miRNA loading values
The predicted contribution of each miRNA to AML was tested by identifying miRNA that had been reported to be involved in inv(16) AML previously. The reported role of each miRNA was noted and then compared to its predicted contribution based on its loading value; large negative loading values have a positive contribution, large positive values have a negative contribution, and loading values near zero have a small contribution. Seven miRNA were found in the literature, and except for those that had small contributions, all predicted contributions miRNA matched the reported role in the literature (Table S8). To compare the relationship between the miRNA and AML genes reported in Rockne et al. (13), the top 10 miRNA with the highest and lowest contribution to AML were selected from the persistent DE miRNA. The correlation of the expression of these five genes and 20 miRNAs were determined using all CM samples ( Figure  5B). This showed that the miRNA most correlated with the AML genes were those that had a positive contribution to AML (negative PC1 loading values).

Expression dynamics aligned by critical points
Hierarchical clustering was performed on a correlation matrix consisting of mean centered log2 normalized CPM expression for all CM samples. All miRNA that had zero values in half the CM samples were included in the analysis (894 miRNA). Four expression dynamic groups were identified with the top-level bifurcation of the dendrogram produced by the hierarchical clustering. The expression dynamic plots were created by calculating and plotting the mean expression of all miRNA contained in the group for each CM sample. The mean expression values were then plotted vs PC1.

Angle between mRNA and miRNA principal components
The angle between each of the PCs from mRNA and miRNA ( Figure S9A) is computed such that for two vectors a and b, = cos −1 � ⋅ ‖ ‖‖ ‖ � where ‖⋅‖ is the L-2 norm, or magnitude of the vector.

Fig. S1. Kit expression plots.
Kit expression was determined both by flow cytometry using the percent of cells that were cKit positive (% cKit+) and by mRNA expression of Kit in each sample. A) % cKit+ is plotted as a function of the AML state-space (PC1 coordinate). B) % cKit+ is plotted as a function of time. In both A) and B), sample trajectories are shown by connecting the time point samples for each mouse. C) %cKit+ and Kit mRNA measured as counts per million (cpm) are plotted for each sample to show general agreement between the two measures of Kit. D) For CM mice only, the Pearson correlation coefficient (ρ) was determined between %cKit+ and the expression of each miRNA. Then, the Pearson correlation coefficient was plotted as a function of the PC1 loading value of the miRNA. The result shows that miRNA that are more highly correlated with % cKit+ also tend to have a high contribution to AML (negative PC1 loading value) and vice versa.

Fig. S2. Top PCs correlated with Kit Expression.
All principal components (PCs) were tested for correlation with Kit expression. (A-D). The first four PCs, which were above the "elbow" of the singular value plot, are shown. PC1 showed the best correlation (R 2 =0.68; p < 0.001) of all PCs and was used to define the miRNA AML statespace.  (13). In this mouse model, cKit+ cells are an immunophenotypic marker for AML; there were no statistically significant differences between control and CM samples at either time point (unpaired t-test p=0.37 for t=0, p=0.82 for t=1). (B) When the same comparison between CM and control is made using the AML state-space component (PC1), the CM samples show a significant decrease in their PC1 component post-induction (unpaired t-test, p<0.01 for t=1). The change in PC1 indicates that the samples have started transitioning toward AML in the statespace. This supports the state-space representation (i.e. the miRNA transcriptome) as an early indicator of state-transition and is able to detect transition toward AML before any cKit+ cells can be detected.  The known critical points used in the simulation 1 , 2 , 3 are shown as solid lines and the estimators derived from A) are shown as box and whisker plots (red line is mean, box is inner quartile, and whiskers are outer quartiles) of the distribution of samples. We see that the mean of K1 is close to 1 , the boundary of K1 and K2 is closest to 2 and the mean of K3 is closest to 3 , supporting our approach to use these as estimators for the critical points in our model. Actual data is less densely sampled in the state-space and produces slightly different estimators.    The miRNA state-space is constructed using different dimensionality reduction algorithms: UMAP, t-SNE, and Diffusion Map to compare with the state-space constructed with the singular value decomposition. For each algorithm, plots were made showing both the first two components and the component that best separates the AML samples is plotted versus time. The SVD was selected to create the state-space and for analysis in the main text because it is a linear  Table S2.